OCR
This page describes API for OCR (Optical Character Recognition) service and combination of OCR on images and ChatGPT (large language models applied on OCR results). The API follows the general rules of Ximilar API as described in Section First steps.
If you are looking for a more detailed analysis of texts, you can use /v2/read_gpt endpoint.
There are many use-cases that you can use this endpoint:
- You can for example datamine information from invoices.
- You want to automatically read texts from trading cards, posters, comics, ...
- You want to read nutrition information from labels on food products, ...
Just specify the prompt
field (str) in record and this prompt will be sent along with full_text
(read text by OCR) from image to ChatGPT API.
This service also supports asynchronous API with webhook. If you need to process large number of requests per day or month (millions of requests) and don't need to get the results immediately then just use our asynchronous requests with webhooks.
Endpoints
This service API has two endpoints running at URLs:
https://api.ximilar.com/ocr/v2/read (for basic OCR reading)
https://api.ximilar.com/ocr/v2/read_gpt (for OCR reading with GPT analysis)
OCR Reading
Given a list of image records, this method returns read text with OCR system on the images. For each image it predicts positions of text (defined by POLYGON) with read text based on language. The result is stored in _ocr
field.
Required attributes
- Name
records
- Type
- dict
- Description
A batch of json records (max 10), one record is representation of an image and it's defined by
_url
or_base64
.
Optional attributes
- Name
lang
- Type
- string
- Default
- Default:en
- Description
Language settings for OCR model, currently: "en", "zh" (chinese), "ko" (korean), "ja" (japanese), "ru" (russian).
Returns
HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed. Body of the response is a JSON object (map) with the following fields:
- Name
records
- Type
- dict
- Description
JSON array with the input
records
, each record enriched by field_ocr
containing detected text with positions and probabilities.
- Name
status
- Type
- dict
- Description
A JSON map/dictionary with a status of the method processing. It contains these subfields:
code
(numeric code of the operation status; it follows the concept of HTTP status codes) andtext
(text describing the status code).
Request
curl https://api.ximilar.com/ocr/v2/read -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
"lang": "en",
"records": [
{
"_url": "__PATH_TO_IMAGE_URL__"
}
]
}'
Response
{
"records": [
{
"_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
"_status": {
"code": 200,
"text": "OK",
"request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6"
},
"_id": "4d903663-a348-4d9b-be76-4df38df7d66a",
"_width": 1140,
"_height": 1520,
"_ocr": {
"texts": [
{
"polygon": [
[285.0, 221.0],
[384.0, 229.0],
[382.0, 256.0],
[283.0, 248.0]
],
"text": "BASIC",
"prob": 0.9564334750175476
}
],
"full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
"lang": "en",
"lang_name": "english"
}
}
],
"status": {
"code": 200,
"text": "OK",
"request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6",
"proc_id": "1a088e14-714a-41bd-9a89-4d732becafcd"
},
"statistics": {
"processing time": 5.054640769958496
}
}
OCR Reading with GPT Analysis
Given a list of image records, this method returns analyzed read text with OCR and analyzed text with ChatGPT based on your input prompt. For each image it predicts positions of text (defined by POLYGON) with read text based on language. The result is stored in _ocr
and _gpt
fields. This endpoint requires at least a business plan activated via app.ximilar.com.
Required attributes
- Name
records
- Type
- dict
- Description
A batch of json records (max 10), one record is representation of an image and it's defined by
_url
or_base64
. Must containprompt
for calling GPT service. Maximum number of tokens to process is limited to 1000 per record/image.
Optional attributes
- Name
lang
- Type
- string
- Default
- Default:en
- Description
Language settings for OCR model, currently: "en", "zh" (chinese), "ko" (korean), "ja" (japanese), "ru" (russian).
Returns
HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed. Body of the response is a JSON object (map) with the following fields:
- Name
records
- Type
- dict
- Description
JSON array with the input
records
, each record enriched by fields_ocr
and_gpt
containing detected text and GPT analysis results.
- Name
status
- Type
- dict
- Description
A JSON map/dictionary with a status of the method processing. It contains these subfields:
code
(numeric code of the operation status; it follows the concept of HTTP status codes) andtext
(text describing the status code).
Request
curl https://api.ximilar.com/ocr/v2/read_gpt -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
"records": [
{
"_url": "__PATH_TO_IMAGE_URL__",
"prompt": "based on the following result from ocr system what is the name and type of the card as json result ({'\''name'\'':'\'''\'', '\''type'\'': '\'''\''})"
}
],
"lang": "en"
}'
Response
{
"records": [
{
"_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
"prompt": "based on the following result from ocr system what is the name and type of the card as json result ({'name':'', 'type': ''})",
"_status": {
"code": 200,
"text": "OK",
"request_id": "5d735c9b-71e7-425f-a4c7-4423230b3748"
},
"_id": "903ae611-2df4-4385-989e-146fa4d0fd7a",
"_width": 1140,
"_height": 1520,
"_ocr": {
"texts": [...],
"full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
"lang": "en",
"lang_name": "english"
},
"_gpt": {
"full_prompt": "Instruction: based on the following result from ocr system what is the name and type of the card as json result ({'name':'', 'type': ''}) \nInput: BASIC MewV 180 ASTRIKE EnergyMix... \nResult:",
"result": "{'name': 'Mew V', 'type': 'Basic'}",
"json_result": {
"name": "Mew V",
"type": "Basic"
}
}
}
],
"status": {
"code": 200,
"text": "OK",
"request_id": "5d735c9b-71e7-425f-a4c7-4423230b3748",
"proc_id": "b366485a-8ed9-462d-9ade-cdb817cae03a"
},
"statistics": {
"processing time": 6.970479965209961
}
}